On the use of visual information for improving audio-based speaker recognition
نویسندگان
چکیده
Audio-based speaker identi cation degrades severely when there is a mismatch between training and test conditions either due to channel or noise. In this paper, we explore various techniques to fuse video based speaker identi cation with audio-based speaker identication to improve the performance under mismatch conditions.
منابع مشابه
Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments
Speech recognition in cocktail-party environments remains a significant challenge for state-of-the-art speech recognition systems, as it is extremely difficult to extract an acoustic signal of an individual speaker from a background of overlapping speech with similar frequency and temporal characteristics. We propose the use of speaker-targeted acoustic and audio-visual models for this task. We...
متن کاملAutomatic speechreading of impaired speech
We investigate the use of visual, mouth-region information in improving automatic speech recognition (ASR) of the speech impaired. Given the video of an utterance by such a subject, we first extract appearance-based visual features from the mouth region-of-interest, and we use a feature fusion method to combine them with the subject’s audio features into bimodal observations. Subsequently, we a...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملAn Automated System for Visual Biometrics
Biometrics has been a topic of great interest since the advent of the information age and will soon lead to a safer and simpler lifestyle where passcodes and keys are inherent to the user. We describe a system capable of automatically extracting visual features from a human face for use in dynamic visual biometrics. Automatic speech and speaker recognition has recently moved towards incorporati...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کامل